APS/123-QED 



Spreading on a complex network avoiding certain motifs 



o 
o 



Tomas AlarcorQ and Hcnrik Jeldtoft Jensen^] 

Institute of Mathematical Sciences, 53 Princes' Gate, 
Imperial College London, London SW7 2PG & Department of Mathematics, 
Imperial College London, South Kensington campus, London SW7 2AZ, UK 

(Dated: March 6, 2009) 

Spreading of either information or matter can often be treated as a network problem. It can 
be of great importance to be able to estimate the likelihood that spreading through a network 
reaches essentially the entire network while still not reaching certain sub-classes of the network. We 
show that excluding nodes and edges from the network has a subtle effect on the percolation. We 
study two specific examples of degree distributions (exponential and scale free) for which analytical 
solutions can be obtained. The two cases exhibit qualitatively different behavior. 
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When information, or matter, spreads through a net- 
work, the local dynamics will typically be related to the 
local community structure This may be because the 
quantity that flows has an effect on the state of the lo- 
cal sub-community, or it may be that information flow- 
ing through a network will trigger a particular type of 
response when it reaches certain topological local sub- 
net structures. One may think of resonance effects if 
the flow involves activation of dynamical variables on 
the nodes. Other examples can be found in the effect 
of unbalanced triangles in social networks Q or frustra- 
tion effects when the dynamics corresponds to optimiza- 
tion. The prototype of the latter example is the role 
plaid by frustrated loops, like e.g. triangles when opti- 
mizing an anti-ferromagnetic energy functional on a net- 
work [3J. Similarly, in sociology or epidemiology it can 
be of great interest to know how likely it is that a macro- 
scopic proportion of the population will be touched by a 
spreading quantity, while certain sub-populations remain 
untouched. In the same vein, the fact that information 
tends to get trapped within communities [J, [5| , provides 
a rationale for, under the right circumstances, trying to 
avoid such communities. 

The general problem of estimating the probability that 
spreading on a network reaches a macroscopic part of the 
entire network, while a certain type of motifs remain un- 
touched, is obviously very complicated in its full general- 
ity. To make some initial headway we consider the spe- 
cial problem concerning spreading on random networks 
and compute the probability that the flow percolate to a 
macroscopic fraction of the network while avoiding a cer- 
tain fraction of triangular motifs. We are able to express 
this probability in terms of the degree distribution and 
the edge clustering coefficients. 

Our main objective is to analyze how exclusion of tri- 
angular motifs affects percolation on weakly-clustered 
networks, i.e. how the spreading manages to reach a 
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macroscopic part of the network without touching a give 
subset of triangular motifs. Typically one would expect 
that removing edges and nodes will make it harder for a 
process to spread across the network. This we confirm. 
However, the scenario is complex. For networks with 
an exponential degree distribution two types of behav- 
ior exist. For high edge-multiplicity the average degree, 
for which the spreading percolates on the decimated net- 
work, increases as function of the number of removed tri- 
angles. In contrast, for low multiplicity the onset of per- 
colation depends in a non-monotonous way on the num- 
ber of excluded triangles. Scale free networks in contrast 
only exhibit the monotonous behavior. 

Model - We now consider a network characterized by 
the following two quantities: the degree distribution P(k) 
and the nodal clustering coefficient c(k). In the limit of 
weak clustering, we are able to express the conditions 
for percolation without hitting the designated triangles 
in terms of these two quantities. The procedure of our 
computation consists of removing, by random sampling, 
a certain proportion T < 1 of all the triangular motifs 
of the original network. The statistical characteristics of 
the sampled network are expressed in terms of the dis- 
tributions for the original network. Next the percolation 
process on the sampled network is studied and the perco- 
lation threshold calculated. This calculation allows us to 
determine under which conditions the spreading process 
percolates on the original network while leaving at least 
a fraction 1 — T triangular motifs untouched. The ap- 
proach we use consists of removing those nodes (and the 
corresponding edges) that belong to the proportion T of 
triangles within the original network. The network gen- 
erated by the remaining nodes and edges (the sampled 
network) consists thus of the proportion of the network 
that does not include any of the designated "no-go" mo- 
tifs. The analysis of percolation on the sampled network 
allows us to study the spreading processes on the origi- 
nal network which stay clear of certain communities. In 
order to advance this program, we need to calculate how 
the relevant quantities on the sampled network relate to 
the original one. In other words, we need to parametrize 
the sampled network in terms of P(k) and c{k). Stumpf 



2 



& Wiuf have extensively analyzed the properties of bi- 
nomial sampling of complex networks in relation to the 
validity of the inference of properties of the entire net- 
work from those of a (randomly sampled) sub-network 
@) 0] • These authors have dealt mostly with the effect of 
binomial sampling on the degree distribution. Here, we 
extend the analysis to motif sampling and also consider 
the effect of sampling on the nodal clustering coefficient. 
We present only a brief summary of our main results. 
An extended treatment including the derivations in full 
detail will be presented somewhere else Q. 

We briefly summarize the effect of sampling on the de- 
gree distribution as analyzed by Stumpf & Wiuf G]. De- 
fine the quantity p — ^jy Efc kQ(fc)-P(fc) where Q(k) is 
the probability of sampling a node with degree fc. The 
quantity p is the entry to the binomial distribution that 
determines whether a node is removed or not. Accord- 
ingly, within the sampled network a node has degree I 
with probability Ps(l) given by: 
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where n(k) is given by: w(k) — Q(k)P(k) / (Q) with 
(Q) = Sfc Q(k)P(k) is the total weight relative to the 
full network actually sampled. The generating function 
corresponding to the sampled network, Gs(x), is given 
by: 

G s {x) = ^7r(fc)(f -p + p.T) fc = G 7I {l-p + px) 

k=0 

Let us now assume that a random fraction T < 1 of 
all the triangles within the network are chosen and des- 
ignated as the motifs to avoid. The probability of a node 
being sampled is given by Q{k) = (1 — Tc(fc)) fe ^ fe_1 ^ 2 , 
where c{k) is the nodal clustering coefficient. This quan- 
tity can be interpreted as the probability that a node of 
degree k belongs to a triangle @]. 

The analytical approach we develop below assumes the 
weak clustering condition, i.e. the average edge multi- 
plicity mo < f . Serrano & Boguha 0] have argued that 
this condition can be expressed in terms of the cluster- 
ing coefficient as c(k) < c /(fc — 1). We will assume 
that c{k) = c /(fc — 1) Q with a > 2. Under these con- 
ditions, an expansion in powers of Tc(fc) of Q{k) can be 
performed. To first order, we obtain: 



Q(fc)~l_T^-^c(fc) 



(2) 



We now turn to the analysis of the effect of sampling 
on different measures of clustering, in particular we con- 
sider the nodal clustering coefficient, c(k), and the edge 
clustering coefficient c(fc,fc'). Hereafter, we will assume 
the original network to be uncorrelated and therefore we 
will be limited to study weakly clustered networks Q. 



Triangular motifs are randomly removed in a uniform 
manner, independently of the degrees of the nodes com- 
posing the triangle. Therefore, the number of trian- 
gles within the degree class k within the sampled net- 
work, Ts(fc), is given in terms of the corresponding num- 
ber of triangles in the original network, T(fc): T${k) = 
(f - T)T(k). Taking into account that T(fc) and c(k) 
must be such that T{k) = \NP{k)k{k - l)c(fc), Eq. © 
leads to the following expression relating the nodal clus- 
tering coefficient of the sampled network, cs(fc), to that 
of the original network: 
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where (c) = '}2 k c(k)P(k). We have taken into account 
that the number of nodes of the sampled network is given 
by N s = (1 - T{c))N. 

The edge clustering coefficient, c(k,k'), which corre- 
sponds to the probability of an edge joining two nodes, 
one of degree k and the other of degree k 1 , share a com- 
mon neighbor Q. In other words, it can be interpreted as 
the probability of a link joining these two nodes to be the 
edge of a triangle. It is defined as c(fc, k') — m k ^i / m% y , 
where, mk,k' is the average multiplicity of the edges link- 
ing degree classes k and k' and m c k k , = max(fc, fc') — 1 is 
its maximum value. 

Serrano & Boguha @ have shown that networks can 
only be considered uncorrelated when clustering is weak. 
In this case mk.k' — mo with mo < 1 independent of k 
and k' , and therefore c(k,k') = mo/m£ fe ,. The corre- 
sponding edge clustering coefficient on the sampled net- 
work will therefore be given by cs(k,k') — ms/? 
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where eg is a constant to be determined in terms of T, 
Co and a where c(fc) = co/(k — l) a . To do so this we use 
the following equation relating mk,k' and cs(fc) j 101 ] : 
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where P S {k,k') = kP s {k)P s {k'\k) / (k) with P s (k'\k). 
Now, by summing up Vfc > I, and using Eq. ([3]), we 
have 
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where ms is the average edge multiplicity on the sampled 
networks and (fc) and (k)u is the average degree in the 
sampled and un-sampled networks, respectively. 

The parameter Pg(l, 1) appearing in Eq. ([5]) is given 
by 0: 
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FIG. 1: The analytic result for the percolation threshold k (corresponding to the critical value of the cut-off parameter /3 in 
Eqs. (JHJ and @) for networks with exponential (left) and scale-free (right) degree distribution. Comparing the two panels 
highlights the different response of between these two types of networks to the sampling process: whereas exponential networks 
show two different types of behavior depending on the edge multiplicity of the original network (i.e. monotonous dependence of 
K on T for large clustering as opposed to non-monotonous dependence for lower clustering), scale- free networks exhibit always 
the same behavior (i.e. percolation being hindered by sampling) regardless of the edge multiplicity of the original network. 
Color code: black lines (inset) correspond to the unsampled exponential network for different values of the average multiplicity 
m, orange lines correspond to m = 0.2 (only shown for scale-free networks), blue lines to m — 0.4 for different values of T, 
green lines correspond to m = 0.6, and red lines correspond to m — 0.8. 
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where A k =i = 1 - (k)P(l, 1)/P(1) and A k> i = 1 - 

2P(i)/<fc)+p(i,i) 0. 

At this point we are ready to study the percolation pro- 
cess on the sampled network by applying the conditions 
derived in [Io| for weakly clustered networks. 
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Next we present results for two particular cases in 
which analytical results can be obtained [H, namely, an 
exponential network with a degree distribution given by: 



(8) 



and a scale-free network characterized by a degree distri- 
bution given by (TlT | 
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In both cases, we consider c(k) = co/(k — l) 2 . Un- 
der these assumptions analytical, closed expressions for 
Gs{x) can be obtained. 



It is important to point out that the spreading pro- 
cess we consider is different from the percolation pro- 
i)) cess on a network where the average multiplicity is re- 
duced from mo to (1 — T)mo, i.e. to study percolation 
on a network with the same characteristics as the origi- 
nal network (same number of nodes, same number edges, 
etc), but with a given number of triangles removed. For 
this process, see the percolation threshold is reduced 
and, thus, percolation is facilitated. The critical value of 
(3 for which the equality sign in Eq. holds we denote 
by k. Fig. [1] shows that for a weakly clustered network, 

(6) the percolation threshold, i.e. k, on the restricted net- 
work is systematically bigger than the one corresponding 
to unrestricted percolation and bigger than the corre- 
sponding percolation thresholds in networks with lower 
clustering (see Fig. []}. This is mainly due to the effective 
removal of nodes and edges: although overall clustering 
is lowered, which, in principle, should favor the onset of 
percolation, at the same time nodes and edges are being 
removed, which acts against percolation. In spite of this 

(7) overall trend, networks with an exponential degree dis- 
tribution show a complex behavior under sampling. Fig. 
[T] demonstrates that the effect on percolation depends on 
both the average edge-multiplicity of the unsampled net- 
work, mo, and the probability of sampling, T. We notice, 
that, for larger values of mo, the onset of percolation is 
hindered as the network is more densely pruned (i.e. as 
T increases). On the contrary, for smaller values of mo, 
the onset of percolation does not depend monotonically 
on T: there exist a value of T for which the correspond- 
ing percolation threshold reaches a maximum value and 
then starts decreasing, thus percolation being favored by 
further pruning. The reason why this behavior comes 
about is that, in the later balance between net- 
work pruning and the related reduction in clustering is 
reached beyond which the decrease in clustering induced 
by further pruning outperforms the corresponding loss 
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FIG. 2: Simulation results for weakly clustered networks with 
exponential degree distribution under sampling for the perco- 
lation probability P as a function of q c = {k(k — l))/{k). In 
agreement with the analytical results shown in Fig. [T] for 
lower clustering (panel (a) corresponding to m = 0.4) P is 
bigger for T = 0.8 than for T = 1. In contrast, for larger 
clustering (panel (b) corresponding to m = 0.6), P depends 
monotonously on T. Color code: black lines (circles) corre- 
spond to the sampled network with T = 0, brown lines (trian- 
gles pointing up) correspond to T — 0.4, blue lines (triangles 
pointing down) correspond to T = 0.6, green lines (squares) 
correspond to T — 0.8, and red lines (diamonds) correspond 
toT = l.. 

of connectivity. In the former cases, where clustering is 
stronger, this regime cannot be reached. These results 
are confirmed by computer simulations shown in Figs. 

where the probability of percolation, P is calculated 
for different values of too and T. We observe that, for 
mo = 0.4 this quantity is bigger for T = 1 than it is for 
T = 0.8, whereas for too = 0.6, P is a monotonically 
increasing function of T, in agreement with our analytic 
results. 

Let us now turn to how the effect of the sampling pro- 
cess depends on the nature of the degree distribution. 
Comparing the left and right panel of Figs. [TJ it is clear 
that exponential and scale-free networks exhibit qualita- 
tive different behavior. In particular, we observe that the 
non-monotonic dependence of the percolation threshold 



on the parameter T, observed in the former case, is absent 
in the case of scale-free networks: even at lower values of 
the average multiplicity to the percolation threshold in 
sampled scale- free networks increases monotonically with 
the proportions of triangles removed. This difference may 
be useful for probing of empirical networks and help to 
discern whether their degrees are distributed according 
to an exponential or to a scale-free distribution (with 
cut-off). 

Summary and discussion - We have studied spreading 
and percolation restricted to a part of a network. In our 
case we specifically avoid a certain fraction of triangles. 
For the case of weakly-clustered, uncorrelated networks 
we have given a full analytical description of the sam- 
pled networks in terms of the parameters of the origi- 
nal network. We have demonstrated that although edge 
and node removal may hinder percolation, the removal 
of triangles can lower the percolation threshold in situa- 
tions where fewer triangles lead to less clustering. This 
effect depends on the functional form of the degree dis- 
tribution and is found in exponential networks but not in 
scale-free networks. In scale-free networks we found that 
motif removal has a strong effect for moderate values of 
the average edge-multiplicity, in which case the onset of 
percolation increases dramatically with the removal of 
triangles. The reason for this is that with a finite cut-off, 
P, in the power law degree distribution low degree nodes 
dominate and are the nodes most likely to participate in 
the formation of triangles. This results in a largely dis- 
connected network in which percolation is not possible. 
Percolation can only be obtained by increasing (3 as this 
allows for nodes with larger degrees which are less likely 
to form triangles and, therefore, to be removed from the 
network. 

The strong dependence on the degree distribution of 
restricted spreading can presumably be a useful way to 
probe the topological nature of big networks. 
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